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Remarks 

Regarding point 2 of the previous Office Action Previously filed claim 26 was rejected under 37 CFR 
1 .75(c) as being an improper dependent claim for failing to further limit the subject matter of a previous 
claim. The limitation U PCR primers" was rejected as an intended use limitation in a product claim. 
Applicants have responded by amending claim 26. The claim no longer recites the limitation "PCR 
primers". 

Regarding point 3 of the previous Office Action 

Claims 29, 31 , 43 and 51 were rejected as indefinite under point 3 (A) due to the language "such as". 
Applicants have responded by amending these claims and removing the language "such as". Further 
discussion of this language is given below. 

Claims 29, 31 , 43 and 51 were further rejected as indefinite under point 3 (B) because of the language 
"paleospecies", "ecospecies", "agamospecies" and "ecosystem species". Clarification was requested. 
The applicants offer the following clarification. The applicants respectfully submit that the scope of these 
terms is definite. Paragraphs [0057] and [0058] of the specification relate to species and creatures. The 
terms appear in paragraph [0058]. These terms are known in the art of genetics. Applicants respectfully 
refer the Examiner to the enclosed definition of the terms "hybrid" and "species" on pages 188 and 364 
respectively of the book A Dictionary of Genetics , 3rd Edition, (Oxford University Press, 1985, eds. R.C. 
King and W.D. Stansfield). The species definition uses the language above and further defines the 
language. Paragraph [0057] of the specification states *777e term creature means any organism that is 
living or was alive at one time. " 

Page 188 of A Dictionary of Genetics defines the term "hybrid" as "an offspring from genetically 
dissimilar parents, perhaps even different species". Paragraph [00581 of the specification makes clear 
that the term "species hybrid" refers specifically to an offspring of different species, such as mules . 
Applicants respectfully submit that the language "species hybrid such as mules" is definite in the art of 
genetics, see p. 57 ("Animal Species Hybrid") and p. 469 (the "Hinny" or "Mule") of the Genetics Manual 
(G. P. Redei, World Scientific, 1998). In order to expedite claim allowance, however, the applicants have 
removed the language "such as mules" from the claims. The term "species hybrid" also includes plant 
species hybrids. See pp. 359-361 of Hybrid Origins of Plant Species (Annu. Rev. Ecol. Syst. 1997, 28: 
359-89 by Loren Rieseberg); see specifically "What is a Hybrid Species?" Bottom p. 360, top p. 361. 
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Claims 44-51 , especially independent claim 44 were rejected as indefinite under point 3 (C) due to the 
language "is used by the apparatus to determine the data". The applicants have amended the claim by 
deleting this phrase as was suggested by the Examiner in the previous Office Action. 

Some further remarks 

The applicants hereby respectfully submit some further remarks to aid the examination of the claims. 

Regarding claim 7 the limitation "and a population and the population is a group of individuals as in the 
field of population genetics" has been eliminated from this claim and other claims. As seen from 
paragraphs [0175]-[0176], a CL-F region need not necessarily be limited by such a limitation. 

Regarding claim 8 the limitation "wherein the width of the subrange of the segment-subrange is less 
than 0.5 and whereby the segment of the segment-subrange is a chromosome segment and the length 
of the chromosome segment is less than or equal to the length of the chromosome" has been added to 
this claim and others. As seen from paragraphs [0062], [0063J and [0095], the specification necessarily 
describes subranges within and smaller than the range 0 to 0.5. The widths of these subranges are 
necessarily less than 0.5, i.e. 0.5 = 0.5 -0. The length of a chromosome segment is less than or equal to 
the length of a chromosome, see [0275], bottom p. 20. 

Regarding claim 17 the amended claim recites "A copy of a set of oligonucleotides.... wherein each 
oligonucleotide in the set is a type (1) complementary oligonucleotide or wherein each olignucleotide in 
the set is a type (2) complementary oligonucleotide, wherein each bi-allelic covering marker is an exact, 
true b'hallelic marker". Applicants respectfully submit that "a copy" is described by "one or more copies" 
[0255], [0265]. Type (1) and type (2) complementary oligonucleotides are described in paragraphs 
[0142] and [0143]. 

Type (1) oligonucleotides are used, for example, as sequence specific type oligonucleotides. Examples 
of their use are given in [0324] and [0349]. Type (2) oligonucleotides are used, for example, as standard 
PCR primers see [0143]. Some examples of their use is given in [0144] and [0249]. 

The term "bi-allelic markers" in the art generally means exact, true bi-allelic markers. (For example, 
SNPs are examples of such exact, true bi-allelic markers.) The specification, however, also expands the 
term "bi-allelic marker" somewhat and describes bi-allelic marker equivalents or BMEs (mathematical 
markers formed from one or more markers that act like they are bi-allelic) and approximate bi-allelic 
markers, see for example [0054] and [0055]. 
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Regarding claim 26 

This claim includes the newly added limitation "wherein thousands of the covering markers are from one 
chromosome". Other claims such as claims 38, 49, and 52 also include this limitation or a similar 
limitation. 

As stated on page 18 of the previously filed Supplemental Amendment/Response of Nov. 20, 2005, at 
the time the application was filed the whole field of association studies is looking to use thousands of bi- 
allelic markers. See for example Risch, N. and Merikangas, K.: The Future of Genetic Studies of 
Complex Human Diseases. Science, 13 September 1996, vol. 273, pp. 1516-1517 cited in [0027] of the 
application. This Risch paper (see p. 1517 mid left most column) describes using technological 
advances to do association testing of five diallelic (or bi-allelic) polymorphisms within each of 100, 000 
genes (a total of 500, 000 polymorphisms tested in the association study). This is 20, 000 or more 
markers on a chromosome. And the inventor's paper is a generalization of the Risch and Merikangas 
analysis [0029]. A copy of the Risch paper was supplied with the Amend/Resp of 1 1/05 and is also 
included with this document. 

Another example of the expectation (at the time of filing) in the field of using large numbers of markers 
(e.g., thousands) from high-density marker maps is the Kruglyak paper. The Kruglyak paper (The use of 
a genetic map of bi-allelic markers in linkage studies , published 9/97, see footnote 4, p.3 of the present 
application) is quoted in the application (see mid [0026]) as predicting a density of at least 1 ,000 SNPs 
(bi-allelic markers) per cM. Since a human chromosome is about 150 cM in length, a density of at 
least 1,000 bi-allelic markers per cM is about at least 150, 000 bi-allelic markers on a 
chromosome (or at least 3 million bi-allelic markers in the genome). Page 21 (right column) of the 
Kruglyak paper essentially predicts that this large number (and density) of markers could be practically 
genotyped using more automated genotyping techniques (p. 24 Kruglyak, endnotes 10, 12, 14, 15, 16) 
that are also described in the present application under Oligonucleotide Technology [0249] in endnote 
11 (application p. 25) in references (1) Chee, (2) Saiki, (3) Wu, and (4) Nickerson. (A copy of the 
Kruglyak paper was supplied with the Amend/Resp of 11/05 and is also included with this document). 
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Another example of this expectation in the field is the Chee paper. The Chee paper ([0341], endnote 8) 
is part of this application through incorporation by reference. The Chee paper describes a high-density 
array (or "gene chip") 'that could query the entire coding content of the human genome, estimated at 
100, 000 genes" (see p. 613, last sentence of the paper). A total of 100, 000 genes necessarily 
means more than 5, 000 markers per chromosome. The Chee paper also describes high-resolution 
marker maps (p. 613 left most column). The Chee paper is cited in the application in connection with 
using thousands of bi-allelic markers in the new two-dimensional techniques of this application, see 
[0249] and [0324]. More specifically paragraph [0325] describes the use of "gene chips" as a physical 
implementation used to scan a particular chromosome or chromosomal region. Since the markers for 
scanning the particular chromosome or chromosomal region must be from the chromosome, the 
limitation "wherein thousands of the covering markers are from one chromosome" is supported. (A copy 
of pages of the Chee paper (pp. 610 and 613) were included for the Examiner's convenience in the 
previously filed Amendment/Response of 9/13/05 and are also included with this document.) 

In addition, the specification describes using more dense marker coverings in [0182] and [0183], 
wherein N is large and the covering distance 5 is small. And these are described in conjunction with 
larger CL-F regions that are N covered. Examples of CL-F regions that are N covered include segment- 
subranges [0185] (wherein, as stated above, the segment covered is less than or equal in length to the 
length of a chromosome). And other examples are CL-F regions wherein the chromosomal location 
coordinates of the CL-F region range over a chromosome or part of a chromosome [075] further support 
the limitation "wherein thousands of the covering markers are from one chromosome" 
Regarding claim 27 the limitations 0.2 and 12 cM are supported for example by [0180] and [0181]. 
Regarding claim 37 this claim is the same scope as previously allowed old claim 38. As is apparent 
from the claim listing, claim 37 simply incorporates the limitation "wherein N > 2" from previously allowed 
old claim 38. 

Regarding claim 46 the limitation "oligonucleotides bound to a glass slide of silicon chip" is already 
present as one limitation in claim 44, from which claim 46 depends. See also paragraphs [144] and 
[0323]. The limitation "wherein each covering marker is an exact, true bhallelic" marker is discussed 
above. 
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Conclusion 



This Amendment/Response has responded to each point of rejection in the previous Office Action of 
12/29/05. The applicants have amended several claims and five new claims have also been added. 
Appropriate fees are also enclosed. 

For the reasons advanced above, applicants respectfully submit that the application is now in condition 
for allowance and that action is earnestly solicited. 

Respectfully submitted, 



Robert O. McGinnis 
Registration No. 44, 232 

May 30, 2006 
1575 WestKagy Blvd. 
Bozeman, MT. 59715 
tel (406)-522-9355 

Enclosures: 

1) A Dictionary of Genetics (3 rd Edition, Oxford University Press, 1985) Title page and pp. 188 and 364. 
(3 sheets total) 

2) Genetics Manual (World Scientific, 1998) Title page and pp. 57 and 469. (3 sheets total) 

3) Hybrid Origin of Plant Species by Loren Rieseberg (Annu. Rev. Ecol. System. 1997. 28:359-89) pp. 
359-361 (3 sheets total) 

4) Future of Genetic Studies of Complex Human Diseases by Risch, N. & Merikangas, K. (Science vol. 
273 13 September 1996 pp. 1516-1517) pp. 1516 & 1517 (total 2 sheets) 

5) The use of a genetic map of biallelic markers in linkage studies by L. Kruglyak (Nature Genetics vol. 
17, Sept. 1997, pp. 21-24) pp. 21-24 (total of 4 sheets) 

6) Accessing Genetic Information with High-Density DNA Arrays by Chee, M. et. al., (Science vol. 274, 
25 Oct. 1996, pp. 610-614) pp. 610 and 613 (2 sheets) 

17 total enclosure sheets 
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SOS boxes the operator sequences in E. coli DNA that are recognized by a renre 
called the LexA protein. This protein represses several loci involved in DNA rerwir r**° T 
tions. See SOS response. P31r tunc " 



SOS response an error-prone mechanism of repairing damaged DNA in E coli b 
coordinated induction of several enzymes. Damaged DNA somehow activates an enz 
called RecA protease, and this protease cleaves a protein called LexA repressor J™* 
genes involved in repair functions become activated when this repressor is cleaved? 
SOS boxes. ' ^ ee 

Southern blotting a technique, developed by E.M. Southern, for transferring electronh 
etically resolved DNA segments from an agarose gel to a nitrocellulose filter paper 
via capillary action. Subsequently the DNA segment of interest is probed with a radioar 
live, complementary nucleic acid, and its position is determined by autoradiography A 
similar technique, referred to as northern blotting, is used to identify RNAs For examnl 
an electropherogram containing a multitide of different mRNAs could be probed with 
radioactive cloned gene. In cases where proteins have been separated electrophoreticallv 3 
a specific protein on an electropherogram can be identified by the western blotting pro J' 
dure. In this case the probe is a radioactively labeled antibody raised against the rLt*Cn 
in question. i«wcin 

sow the adult female of swine. 

spacer DNA untranscribed segments of eukaryotic and some viral genomes flanking 
functional genetic regions (cistrons). Spacer segments usually contain repetitive DNA The 
function of spacer DNA is not presently known, but it may be important for synapsis Set 
transcribed spacer. 

spawn to deposit eggs. 

spay to remove the ovaries. 

special creation a nonscientific philosophy asserting that each species has originated 
through a separate act of divine creation by processes that are not now in operation in the 
natural world. 

specialized 1. an organism having a narrow range of tolerance for one or more ecological 
conditions. 2. a species having a relatively low potential for further evolutionary change- 
the opposite of generalized. 

specialized transduction See transduction. 

speciation 1. the splitting of an ancestral species into daughter species that co-exist in 
time; horizontal evolution or speciation; cladogenesis. 2. the gradual transformation of one 
species into another without an increase in species number at any time within the lineage; 
vertical evolution or speciation; phyletic evolution or speciation. 

species 1. biological (genetic) species: reproductively isolated systems of breeding popu- 
lations. 2. paleospecies (successional species): distinctly different appearing assemblages of 
organisms as a consequence of species transformation (q.v.). 3. taxonomic (morphological; 
phenetic) species: phenotypically distinctive groups of coexisting organisms. 4L microspe- 
cies (agamospecies): asexually reproducing organisms (mainly bacteria) sharing a common 
morphology and physiology (biochemistry)- 5. biosystematic species (ecosperies: coenos- 
pecies): populations that are isolated by ecological factors rather than ethological isolation 
(q.v.). 

species group superspecies (q. v. ). 

species selection a form of group selection (q.v.) in which certain species (produced by 
cladogenesis) continue the cladogenic process and others become extinct. 
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Animal models continued 

pigmentation of the retina, h. chr. 6p21.2-cen, mouse gene RD2™ m. chr. 17), gonadal dys- 
genesis (underdeveloped germcells in the testes, h. chr. Yl 1.2-pter, mouse gene Srf», m. chr 
" ^rosinase negative oculocutaneous albinism (see albinisms, h. chr. Ilql4-q21, mouse 
gene Tyre, m. chr. 7). By disruption of hexosaminidase a subunit a model for the Tay-Sachs 
disease has been generated in mouse. Interestingly, these animals suffered no obvious behav- 
ioral or neurological deficit. Disrupting the hexoseaminidase 0 subunit (Sandhoff disease mo- 
del) resulted in massive depletion of spinal cord axons and neuronal storage of ganglioside 

MOUSE POLYGENIC DISORDERS WITH SIMILARITIES TO HUMAN CONDITIONS 
inuman prob em - mouse strain]: alcoholism and opiate drug addictions - C57BL/6J, asthma - 
A/J atherosclerosis - C57BL, audiogenic (sound-induced) seizures - DBA, cleft palate (fissure 
m SSSP \ ' dea t ness - LP < dental disea se - C57BL, BALB/c, diabetes - NOD, epilepsy 
- EL, SWXL-4, granulosa cell tumors in the ovary - SWR, germ cell tumors in the ovary - 
LT germ cell tumors m the testes - 129, hemolytic anemia - NZB, hepatitis - BALB/c 
3 dlsease (P re - B <*» lymphoma - SJL, hypertension - MA/My, kidney adenocarcinoma 
-BALB/cCd leprosy (Mycobacterium leprae ) - BALB/c, leukemia - AKR/J, C58/J, P/J lung 
tumors - A, Ma/My, measles - BALB/c, osteoporosis - DBA, polygenic obesity - NZB NZW 
pulmonary tumors - A/J, rheumatoid arthritis - MRIVMp, spina bifida (defect of the bones of 
the spinal cord) - CT, systemic lupus erythematosus (a skin degeneration) - NZB NZW 
whooping cough (pertussis) - BALB/c. (Some of the data by courtesy of GIBCO BRL Co.) ' 
ANIMAL POLE: is dorsal end of the animal egg opposite the lower end, the vegetal pole and 
where the sperm entry is located. After the entry, the egg cortex rotates slightly and in some 
species at the side opposite the entry a gray crescent is formed. (See vegetal pole) 
ANIMAL SPECIES HYBRIDS: the most familiar example is the hybrids of the mare (Equus 
caballus 2n = 64) and the jackass (Equus asinus, 2n = 62), and the stallion and the she-ass 
ihe hybrid males do not produce viable sperm although they may show normal libido The 
temales may have estrus and ovulate but there is no proven cases of fertility. Zebras (2n = 44) 
also may form hybrids with both donkeys and horses. Buffalo (Bison bison, 2n = 60) may be 
crossed reciprocally with cattle (Bos taurus, 2n = 60) but their offspring (cattalo) has reduced 
fertility. The domesticated pig (Sus crofa, 2n = 38) forms fertile hybrids with several wild pigs 

with the same number of chromosomes. The sheep (Ovis 
aries, 2n = 54) interbreeds with the wild mouflons but the 
sheep x goat (Capra hircus, 2n = 60) hybrid embryo only 
rarely can be kept alive. Some monkeys can be interbred 
but primates are generally sexually isolated. There is no 
sexual barrier among the various human races, indicating 
close relationship but no hybrids are known between 
humans and any other species. These general rules do not 
hold for somatic cell hybrids because human cells can be 
fused with rodent or plant cells but they cannot be regener- 
ated or even maintained successfully for indefinite periods 
of time. The hybridization barrier is not identical with other 
functional barriers. 





::~ -'-'V :x 

ANIMAL TRANSFORMATION VECTORS: most commonly Simian virus 40 (SV40) and 
Bovine papilloma virus (BPV) based vectors are used. The BPV vectors can be used for the 
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HIMALAYAN RABBIT 



^i/jdlll: restriction enzyme with recognition site AiAGCTT 

(h'stone nuclear factor): a 48 K M r protein, identical to 

„rJ^t er0n re g ulator y fac tor IRF-2. (See IRF-2, histones) 
HINGE: see antibody 

HI m N uTe Mtl?s% ( reVasi 2 Mot^r <? = 64) 5?*^ The reCipr ° Cal ( ma - x jackass) is called 
sTal lion mates with fte « h P ? the J ackass Wlllin 8'y mates with the mare but the 

staiiion mates with the she-ass only under special circumstances (blindfolded) The hybrids' 
body resembles closer the female parent as an apparent cytoplasmic influence These ste He 
Sa^ °5 hui r civilization"-^ retain some sexual drive and 

Tne Tf» XI i y " rep ° rted m backcrosse s with either the jackass or the stallion 

The jackass backcrosses are ent.rely sterile but the backcrosses with stallions appear more nor 
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Abstract 

The origin of new homoploid species via hybridization is theoretically difficult be- 
cause it requires the development of reproductive isolation in sympatry. Nonethe- 
less, this mode is often and carelessly used by botanists to account for the for- 
mation of species that are morphologically intermediate with respect to related 
congeners. Here, I review experimental, theoretical, and empirical studies of 
homoploid hybrid speciation to evaluate the feasibility, tempo, and frequency of 
this mode. Theoretical models, simulation studies, and experimental syntheses 
of stabilized hybrid neospecies indicate that it is feasible, although evolution- 
ary conditions are stringent. Hybrid speciation appears to be promoted by rapid 
chromosomal evolution and the availability of a suitable hybrid habitat. A selfing 
breeding system may enhance establishment of hybrid species, but this advantage 
appears to be counterbalanced by lower rates of natural hybridization among self- 
ing taxa. Simulation studies and crossing experiments also suggest that hybrid 
speciation can be rapid — a prediction confirmed by the congruence observed be- 
tween the genomes of early generation hybrids and ancient hybrid species. The 
frequency of this mode is less clear. Only eight natural examples in plants have 
been rigorously documented, suggesting that it may be rare. However, hybridiza- 
tion rates are highest in small or peripheral populations, and hybridization may be 
important as a stimulus for the genetic or chromosomal reorganization envisioned 
in founder effect and saltational models of speciation. 



INTRODUCTION 

Hybridization may have several evolutionary consequences, including increased 
intraspecific genetic diversity (2), the origin and transfer of genetic adaptations 
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(2,93), the origin of new ecotypes or species (42, 102), and the reinforcement 
or breakdown of reproductive barriers (27, 55, 77). Although the frequency and 
importance of these outcomes are not yet clear in either plants or animals, a 
critical body of data is now available for assessing the mechanistic basis and 
frequency of one of these — the origin of new species. The last comprehensive 
review of this topic in relation to plants was Grant's (42) monograph "Plant 
Speciation." Grant listed six mechanisms by which the breeding behavior of 
hybrids could be stabilized, thus providing the potential for speciation: 

1. asexual reproduction; 

2. permanent translocation heterozygosity; 

3. permanent odd polyploidy; 

4. allopolyploidy; 

5. the stabilization of a rare hybrid segregate isolated by postmating barriers; 

6. the stabilization of a rare hybrid segregate isolated by premating barriers. 

The first three of these mechanisms generate flocks of clonal or uniparental 
microspecies that span the range of morphological variability between the 
parental species. Sexual reproduction among microspecies is limited or ab- 
sent, making it difficult to discuss their origin and evolution in the context * 
of sexual isolation and speciation. By contrast, the latter three mechanisms 
generate sexual derivatives and therefore have the potential to give rise to new 
biological species. 

This review focuses on the origin of sexual, homoploid hybrid species (mech- 
anisms 5 and 6), (but see 50, 89 for reviews of polyploidy in plants). After 
clarification of concepts and terminology, the historical basis of our current un- 
derstanding of hybrid speciation is reviewed. This is followed by examination 
of the frequency of natural hybridization and an exploration of experimental 
and theoretical studies that test the feasibility of homoploid hybrid speciation. 
Gnce the feasibility of this mode of speciation has been established, I briefly 
critique the methods used for identifying homoploid hybrid species in nature 
and then focus on those examples of hybrid speciation that are well established. 
Finally, I discuss promising areas for future research and possible approaches 
that may facilitate studies of this mode. 

WHAT IS A HYBRID SPECIES? 

Both "hybrid" and "species" can have several meanings for evolutionary bi- 
ologists. The term hybrid can be restricted to organisms formed by cross- 
fertilization between individuals of different species, or it can be defined more 
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broadly as the offspring between individuals from populations "which are dis- 
tinguishable on the basis of one or more heritable characters" (44). I prefer this 
broader definition of hybrids, as it provides greater flexibility in usage. Nonethe- 
less, in this review, I focus on hybrids formed by crosses between species. & 
The term species has a much wider variety of definitions, ranging from con- 
cepts based on the ability to interbreed to those based on common descent. 
Mayr's (59) biological species concept — "species are groups of interbreed- 
ing natural populations which are reproductively isolated from all other such 
groups" — is perhaps the most widely accepted of these. Although I have previ- 
ously expressed concern about the limitations of this concept (73), its emphasis 
on reproductive isolation does offer a straightforward approach to the study of 
speciation (20). Moreover, the evolution of reproductive barriers is particularly 
crucial to the successful origin of new hybrid species; otherwise, the new hy- 
brid lineage will be swamped by gene flow with its parents. Thus, the focus 
of this review is on the evolution of reproductive isolation between new hybrid 
lineages and their parents. 

HISTORICAL PERSPECTIVE 

The hypothesis that new species may arise via hybridization appears to have 
originated with Linnaeus (58; cited in 84), who wrote "it is impossible to doubt 
that there are new species produced by hybrid generation. ... For thence it 
appears to follow, that the many species of plants in the same genus in the 
beginning could not have been otherwise than one plant, and have arisen from 
this hybrid generation." This represents a modification of the orthodox view 
of special creation, which asserted that all existing species were created by the 
hand of God and which denied the existence of constant hybrids ( 1 5). However, 
Linnaeus' observations were limited to Fj hybrids, and he was unaware of 
potential difficulties with his hypothesis such as segregation and sterility. 

Rigorous experimental study of plant hybridization was initiated by Joseph 
Kolreuter in 1760 and led to two critical discoveries (84). First, Kolreuter found 
that a hybrid from Nicotiana paniculata x N. rustica produced no seeds — the 
first "botanical mule." As a result, Kolreuter concluded that hybrid plants are 
produced only with difficulty and are unlikely to occur in nature in the absence 
of human intervention or disturbance to the habitat. Second, Kolreuter and 
his successor, Carl Gartner, discovered that later generation hybrids tended to 
revert back to the parental forms, thus refuting the existence of constant hy- 
brids and supporting the orthodox view of special creation (84). The views of 
Kolreuter and Gartner on the lack of constancy of hybrids (although not nec- 
essarily on creation) were held by most other prominent botanical hybridizers 
during the eighteenth and nineteenth centuries, including Charles Darwin, John 
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The Future of Genetic Studies of 
Complex Human Diseases 

Neil Risch and Kathleen Merikangas 



Geneticists have made substantial progress in 
identifying the genetic basis of many human 
diseases, at least those with conspicuous deter- 
minants. These successes include Huntington's 
disease, Alzheimer's disease, and some forms of 
breast cancer. However,, the detection of ge- 
netic factors for complex diseases — such as 
schizophrenia, bipolar disorder, and diabetes — 
has been far more complicated. There have 
been numerous reports of genes or loci that 
might underlie these disorders, but few of these 
findings have been replicated. The modest na- 
ture of the gene effects for these disorders likely 
explains the contradictory and inconclusive 
claims about their identification. Despite the 
small effects of such genes, the magnitude of 
their attributable risk v (the proportion of people 
affected due to them ) may be large because they 
are quite frequent in the population, making 
them of public health significance. 

Has the genetic study of complex disorders 
reached its limits? The persistent lack of 
replicability of these reports of linkage be- 
tween various loci and complex diseases 
might imply that it has. We argue below that 
the method that has been used successfully 
(linkage analysis) to find major genes has lim- 
ited power to detect genes of modest effect, 
but that a different approach (association 
studies) that utilizes candidate genes has far 
greater power, even if one needs to test every 
gene in the genome. Thus, the future of the 
genetics of complex diseases is likely to require 
large-scale testing by association analysis. 

How large does a gene effect need to be in 
order to be detectable by linkage analysis? 
We consider the following model: Suppose a 
disease susceptibility locus has two alleles A 
and a, with population frequencies p and q = 
1 - p, respectively. There are three geno- 
types: AA, Aa, and aa. We define genotypic 
relative risks (GRR, the increased chance 
that an individual with a particular genotype 
has the disease) as follows: Let the risk for 
individuals of genotype Aa be y times greater 
than the risk for individuals with genotype 
aa, a GRR of y. We assume a multiplicative 
relation for two A alleles, so that the GRR 
for genotype A A is y 1 . The method of link- 
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age analysis we have chosen for this argu- 
ment is a popular current paradigm in which 
pairs of siblings, both with the disease, are 
examined for sharing of alleles at multiple 
sites in the genome defined by genetic mark- 
ers. The more often the affected siblings 
share the same allele at a particular site, the 
more likely the site is close to the disease 
gene. Using the formulas in ( / ), we calculate 
the expected proportion Y of alleles shared by 
a pair of affected siblings for the best possible 
case — that is, a closely linked marker locus 
(recombination fraction 9 = 0) that is fully 
informative (heterozygosity = 1) (2)- 



2+u> (PT+<?> 

If there is no linkage of a marker at a 
particular site to the disease, the siblings 
would be expected to share alleles 50% of the 
time; that is, Y would equal 0.5. Values of Y 
for various values of p and y are given in the 
third column of the table. For an allele of 
moderate frequency (p is 0.1 to 0.5) that con- 
fers a GRR (y) of fourfold or greater, there is a 
detectable deviation of Y from the null value of 
0.5. On the other hand, for an allele conferring 
a GRR of 2 or less, the expected marker-sharing 
only marginally exceeds 50%, for any allele 
frequency (p). Thus, it is clear that the use of 



linkage analysis for loci conferring GRR of 
about 2 or less will never allow identification 
because the number of families required 
(more than -2500) is not practically achiev- 
able. 

Although tests of linkage for genes of mod- 
est effect are of low powei; as shown by the 
above example, direct tests of association with 
a disease locus itself can still be quite strong. 
To illustrate this point, we use the transmis- 
sion/disequilibrium test of Spielman et cd. (3). 
In this test, transmission of a particular allele 
at a locus from heterozygous parents to their 
affected offspring is examined. Under Mende- 
lian inheritance, all alleles should have a 50% 
chance of being transmitted to the next gen- 
eration. In contrast, if one of the alleles is 
associated with disease risk, it will be trans- 
mitted more often than 50% of the time. 

For this approach, we do not need families 
with multiple affected siblings, but can focus 
just on single affected individuals and their 
parents. For the same model given above, we 
can calculate the proportion of heterozygous 
parents as pq(y + l)/(jry +<?)(4). Similarly, 
the probability for a heterozygote parent to 
transmit the high risk A allele is just y/{ 1 +7). 
Association tests can also be performed for 
pairs of affected siblings. When the locus is 
associated with disease, the transmission excess 
over 50% is the same as for single offspring, but 
the probability of parental heterozygosity is in- 
creased at low values of p; for higher values of p, 
the probability of parental heterozygosity is de- 
creased. The formula for parental heterozygos- 
ity for an affected pair of siblings for the same 
genetic model as used in the first example is 
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Comparison of linkage and association studies. Number of families needed for identification of a 
disease gene. 
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On the right side of the table, we present 
the proportion of heterozygous parents (Het) 
and the probability of transmission of the A 
i allele from a heterozygous parent to an af- 
fected child [P(tr-A)] for the same values of 
GRR as considered above for the example of 
linkage analysis. The deviation from the null 
hypothesis of 50% transmission from het- 
erozygous parents is substantially greater 
than the excess allele sharing that is found by 
linkage analysis in sibling pairs. This dispar- 
ity between the methods is particularly true 
for lower values of y (that is, with lower rela- 
tive risk). For example, for y - 1.5, allele 
sharing is at roost 51%, while the A allele is 
transmitted 60% of the time from heterozy- 
gous parents. 

In this respect then, association studies 
seem to be of greater power than linkage 
studies. But of course, the limitation of as- 
sociation studies is that the actual gene or 
genes involved in the disease must be tenta- 
tively identified before the test can be per- 
formed. In fact, the actual polymorphism 
within the gene (or at least a polymorphism in 
strong disequilibrium) must be available. 
However, we show that this requirement is 
only daunting because of limitations imposed 
by current technological capabilities, not be- 
cause sufficient families with the disease are 
not available or the statistical power is inad- 
equate (5). For example, imagine the time 
when all human genes (say 100,000 in total) 
have been found and that simple, diallelic 
polymorphisms in these genes have been 
identified. Assume that five such diallelic 
polymorphisms have been identified within 
each gene, so that a total of 10 X 10 5 = 10 6 
alleles need to be tested. The statistical prob- 
lem is that the large number of tests that need 
to be made leads to an inflation of the type 1 
error probability. For a linkage test with pairs 
of affected siblings, we use a lod score (loga- 
rithm of the odds ratio for linkage) criterion 
of 3.0, which asymptotically corresponds to a 
type 1 error probability a of about ICH. In a 
linkage genome screen with 500 markers, 
this significance level gives a probability 
greater than 95% of no false positives. The 
equivalent false positive rate for 1,000,000 
independent association tests can be ob- 
tained with a significance level a = 5 X 10~*. 

We illustrate the power of linkage versus 
association tests at different significance lev- 
els by determining the sample size N (num- 
ber of families) necessary to obtain 80% 
power (the probability of rejecting the null 
hypothesis when it is false) (6) (see table). 
With a linkage approach and a disease gene 
with a GRR of 4 or greater, the number of 
affected sibling pairs necessary to detect link- 
age is realistic (185 or 297), provided the 
allele frequency p is between 5 and 75%. For 
a gene with a GRR of 2 or less, however, the 
sample sizes are generally beyond reach (well 



over 2000), precluding their identification 
by this approach. In contrast, the required 
sample size for the association test, even al- 
lowing for the smaller significance level, is 
vastly less than for linkage, especially for af- 
fected sibling pair families when the value of 
p is small Even for a GRR of 1 .5, the sample 
sizes are generally less than 1000, well within 
reason. 

Thus, the primary limitation of genome- 
wide association tests is not a statistical one 
but a technological one. A large number of 
genes (up to 100,000) and polymorphisms 
(preferentially ones that create alterations in 
derived proteins or their expression) must first 
be identified, and an extremely large number 
of such polymorphisms will need to be tested. 
Although testing such a large number of poly- 
morphisms on several hundred, or even a 
thousand families, might currently seem im- 
plausible in scope, more efficient methods of 
screening a large number of polymorphisms 
(for example, sample pooling) may be pos- 
sible. Furthermore, the number of tests we 
have used as the basis for our calculations 
(1,000,000) is likely to be far larger than nec- 
essary if one allows for linkage disequilibrium, 
which could substantially reduce the required 
number of markers and families needed for 
initial screening. 

Some of the important loci for complex 
diseases will undoubtedly be found by link- 
age analysis. However, the limitations to de- 
tecting many of the remaining genes by link- 
age studies can be overcome; numerous ge- 
netic effects too weak to identify by linkage 
can be detected by genomic association stud- 
ies. Fortunately, the samples currently col- 
lected for linkage studies (for example, af- 
fected pairs of siblings and their parents) can 
also be used for such association studies. 
Thus, investigators should preserve their 
samples for future large-scale testing. 

The human genome project can have 
more than one reward. In addition to se- 
quencing the entire human genome, it can 
lead to identification of polymorphisms for 
all the genes in the human genome and the 
diseases to which they contribute. It is a 
charge to the molecular technologists to de- 
velop the tools to meet this challenge and 
provide the information necessary to identify 
the gene tic basis of complex human diseases. 
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For each affected sib pair, we score the number of 
alleles shared ibd from each of 2N parents. Define 
Sj = 1 if an allele is shared from the rth parent and 
B j = -1 if unshared. Under the null hypothesis of 
no linkage, P(8j = 1 ) = P{B { = -1 ) « 0.5. so aflj) = 
0 and Var(fl^) = 1 . For the genetic model described 
above with genotypic relative risks of y 2 , % and 1 , 
allele sharing by affected sibs is independent for 
the two parents; thus, we can consider sharing of 
alleles one parent at a time. Thus, for affected sib 
pairs assuming 9 = 0 and no linkage disequilibrium, 
the formula is 

N (Z «- qZ '-e )2 

where 

u = 2/-1 

y- 1 + w 
2+w 

pq(Y-l) 2 
w = — 

(py+Q) 

4, = 3.72 (corresponding to a = 10 -4 ), and Z-) _ a 
= -0.84 (corresponding to 1 - p « 0.80).. For an 
association test using the transmission/disequili- 
brium test, with the disease locus or a nearby lo- 
cus in complete disequilibrium, the number (A/) of 
families with affected singletons required for 80% 
power is also calculated from formula 1. For this 
case, we score the number of transmissions of allele 
A from heterozygous parents. Let h be the probabil- 
ity a parent is heterozygous under the alternative 
hypothesis, namely, h = pq(y+ 1)/(py + q). Then de- 
fine Sj = /T 0 - 5 if the parent is heterozygous and al- 
lele A is transmitted; ft = 0 if the parent is homozy- 
gous; and B\ = -n~° 5 if the parent is heterozygous 
and transmits allele a. Under the null hypothesis, 
£(8j) = 0 and Var(fl|) = 1. Under the alternative hy- 
pothesis, u. = £(fl[) = V% - My + 1) and a 2 = 
Var(fl|) = 1-h{y- 1) 2 /(y + 1) 2 . In this case, there are 
two parents per family and they act independently, 
so the required number (N) of families is given by 
half of formula 1 where n and o 2 are given above. 
Here, = 5.33 (corresponding to a = 5 x ICT 8 ). For 
the same test but with affected sib pairs instead of 
singletons, the number of families required is given 
by half of formula 1 (transmissions from two parents 
to two children) with the same formulas for \l and a 2 
as for singleton families but now using the heterozy- 
gote frequency for parents of affected sib pairs. Us- 
ing the above formulas, we can calculate sample 
sizes for the three study designs. 
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The use of a genetic map of biallelic 
markers in linkage studies 



Leonid Kruglyak 



Improvements in genetic mapping techniques have driven recent progress in human genetics. The use of single 
nucleotide polymorphisms (SNPs) as biallelic genetic markers offers the promise of rapid, highly automated 
genotyping. As maps of SNPs and the techniques for genotyping them are being developed,it is important to consider 
what properties such maps must have in order for them to be useful for linkage studies. I examine how polymorphic 
and densely spaced biallelic markers need to be for extraction of most of the inheritance information from human 
pedigrees, and compare maps of biallelics with today's genome-scanning sets of microsatellite markers. I conclude that 
a map of 700-900 moderately polymorphic biallelic markers is equivalent— and a map of 1,500-3,000 superior— to the 
current 300-400 microsatellite marker sets. 
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The revolution in human genetics (hat has unfolded over ihe past 
decade and a half has been driven largely by the development of 
genetic maps. The original concept was proposed by Botstein ct nl.> 
with restriction fragment length polymorphisms (RFLPs) as mark- 
ers 1 . The first human RFLP was quickly identified 2 , and I hinling- 
ton's disease soon became the first autosomal disorder linked to an 
anonymous DNA marker 3 . The first RFLP map of the human 
genome followed shortly 4 . RFLPs were based on a variety of poly- 
morphisms at the sequence level (single nucleotide changes, inser- 
tions and deletions, repeat length polymorphisms) and were assayed 
by Southern hybridization. Although a great advance, RFLPs were 
often not very polymorphic, and they were costly and time-con- 
suming to develop and assay in large numbers. Nevertheless, these 
markers made human molecular genetics a reality and led to the 
mapping of a number of important mendelian diseases. 

The next major advance came 
with the discovery and develop- 
ment of microsatellites (STRs or 
SSLPs) as markers 5 . These loci 
are abundant, have fairly high 
polymorphism rates and can be 
assayed by PCR, leading to 
lower cost and a greater degree 
of automat ion. Dense maps of 
microsatellites are now avail- 
able 6,7 , allowing simple men- 
delian diseases to be mapped 
with relative ease and enabling 
first searches for genetic causes 
of complex diseases by genome 
scan. However, the require- 
ments to assay the loci on gels 
and to distinguish several 
length-based alleles make it 
hard to fully automate the geno- 
typing process, and typing large 
numbers of individuals for 
markers covering the genome 

remains beyond the resources of Fi 9- 1 Expected lod score (ELOD) for a dominant locus is plotted against informa- 
... r 1 l Tl - tl tioncontent. Each circle represents the results of a simulation for one of 130 maps, 
all but a lew labs. 1 lieie IS tllUS ^ descrjbed jn Met hods. The solid line shows the expected linear correlation if 
a need to move beyond this cur- information content of 0 corresponds to an ELOD of 0 and information content 
rent technology. of 1 corresponds to the maximum achievable ELOD of 6.02 in these pedigrees. 
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Recent attention has focused on the use of single nucleotide 
polymorphisms (SNPs) as genetic markers. At first glance, this 
may appear to represent a step back to the days of low polymor- 
phism rates characteristic of RFLPs. However, modern technolo- 
gy should allow efficient assays of SNPs in numbers sufficiently 
large to offset their lower polymorphism rates, as discussed below. 
SNPs offer a number of important advantages over microsatel- 
lites. They are highly abundant, with classic estimates of more 
than 1 per 1,000 base pairs, or more than 3 million in the 
genome 8 ' 9 . To date, more than 1,000 PCR-amplifiable SNP mark- 
ers have been discovered and mapped (D. Wang, pers. comm.). 
Because SNPs have only two (common) alleles (hence the term 
'biallelics'), genotyping them requires only a plus/minus assay 
rather than a length measurement, permitting easier automation. 
Several non-gel-based assays have been proposed 10-14 , with high- 
density oligonucleotide arrays 
currently showing great promise 
for typing large numbers of 
biallelic markers in parallel 1 5 ' 16 . 

Here 1 consider the feasibility 
of carrying out linkage studies 
with a genetic map based on bial- 
lelic markers. The key questions 
are: What level of polymorphism 
is required? and How many 
markers adequately cover the 
genome? These questions are 
addressed below. 

Assumptions 

The effects of marker density 
and polymorphism were exam- 
ined by simulating pedigree 
genotype data and measuring 
the information content 17 ' 18 for 
a broad range of map densities 
and polymorphism levels (see 
Methods for simulation details). 
Information content measures 
the fraction of inheritance 
information extracted by the 
map relative to that which 
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Frequency of the common allele 



Fig. 2 Information content for five map densities is plotted against the frequency 
of the more common of the two alleles of a biallelic marker. The circles show 
actual simulation data points. 



" Table 2 • Information content for microsatellites 
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Table 1 • Information^ontentforbiallelics^y 



spacing (cM) 


50-50 


60-40 


70-30 


80-20 


1 


0.88 


0.88 


0.87 


0.84 


2 


0.75 


0.76 


0.73 


0.69 


3 


0.65 


0.65 


0.63 


0.56 


4 


0.58 


0.56 


0.53 


0.48 


5 


0.50 


0.49 


0.46 


0.41 


6 


0.45 


0.43 


0.41 


0.36 


7 


0.39 


0.39 


0.37 


0.32 


8 


0.35 


0.35 


0.33 


0.28 


• 9 


0.32 


. 0.31 


0.29 


0.25 


10 


0.29 


0.28 


0.26 


0.23 



0.73^1 
0.55 

0.42 ^ 
0.34 

0.24 ^ 
0.22^ ^ 
0.19^, 
0.17 : :^' 
0.15 : 



would be extracted by an infinitely dense polymorphic map. Thus, 
an information content of 1 reflects complete information; where- 
as an informationxontent of 0 reflects no information. Informa- 
tion content incorporates both marker density and polymorphism 
in a single general measure of map quality that is independent of 
assumptions about a particular disease locus. It also closely pre- 
dicts the power of a map to detect linkage— for example, as mea- 
sured by the expected lod score (ELOD; Fig. 1 ). 

The markers were assumed to be evenly spaced, and informa- 
tion content was measured at a location halfway between two mark- 
ers, where it is expected to be lowest. For clarity, a single pedigree 
structure is used throughout: first-cousin pairs with parents but 
not grandparents available for genotyping. Extensive simulations 
show that although the absolute numbers differ somewhat for oilier 
pedigree structures, all the main conclusions about the relative 
importance of marker polymorphism and density continue to hold. 

How polymorphic do biallelic markers need to be? 

Biallelic markers vary in their rates of polymorphism: the more 
common allele can range in frequency from 50% to nearly 100%. 
In considering a map of biallelic markers, it is important to ask 
whether only near-perfect (50-50) biallclics are useful or whether 
less polymorphic markers Gin provide comparable amounts of 
information. To answer this question, I measured information con- 



lent in simulations of maps of biallelic markers with varying degrees 
of polymorphism. 

The results (l ig. 2, Table I ) clearly indicate that at higher map 
densities, allele frequency has only a small effect on information 
content in the range of frequency distributions from 50-50 to 80-20. 
Specifically, a 1-cM map of 60-40 biallelics provides an informa- 
tion content ol*0.88, essentially the same as perfect 50-50 biallelics 
at this density, while 70-30 biallelics provide an information con- 
tent of 0.87, and 80-20 biallelics provide an information content of 
0.84. The information content drops to 0.73 for 90-10 biallelics. 
Thus, the use of biallelic markers with frequency distribution as 
skewed as 80-20 leads to little reduction in the information content 
of a dense map. For sparser maps of 5-10 cM, a similar conclusion 
holds for marker allele frequency distributions as skewed as 70-30. 

How dense does a map of biallelic markers need to be? 

Although there is a limit on how polymorphic a biallelic marker 
can be (a 50-50 distribution of the two alleles), there is essentially 
no theoretical limit on map density (or marker number), as rea- 
sonably polymorphic SNPs can be found roughly every 1 kb, or 
about 3 million times in the human genome (see above). Thus, 
one answer to how many markers are needed is that more is always 
better 1 . For common linkage study designs, however, the addition 
of markers provides diminishing returns once most of the inheri- 
tance information has been extracted. As shown above, a 1-cM 
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map of 50-50 biallelic markers extracts 88% of the available infor- 
mation, and it is unlikely that higher information content is need- 
ed in an initial screen for linkage. What is the informational cost 
of decreasing the density of the map? Simulation results (Eig. 2) 
show that map density plays a more critical role than marker poly- 
morphism. A 2-cM map provides information content of 0.75, a 
3-cM map 0.65 and a 5-cM map 0.50. Together with (he results of 
the previous section, these numbers lead to the conclusion that for 
initial linkage studies it is desirable to screen a dense ( 1-2-cM) map 
of moderately polymorphic (50-50 to 80-20) biallelic markers. 
Interesting regions can then be followed up with all available (bial- 
lelic and microsatellite) markers. 

It is worth noting that there areHvo separate issues regarding 
map density: how ninny markers exist and bow many markers can 
be genolyped rapidly and cost-effectively. Although current 
microsatellite maps cover the genome at an average spacing of less 
than I cM (with more than 5,000 markers in the dual (ienelhon 
map alone 7 ), gcnolyping more than a few hundred markers in a 
large collection of families remains beyond the power of today's 
technology and research. budgets. Thus, the practical limit on the 
number of biallelic markers will depend on the techniques for 
marker development and genotyping. Nonetheless, it is interest- 
ing to compare such maps with current maps of microsatellite 
markers. Such a comparison is carried out in the next section. 

Comparison of maps based on biallelics and microsateilites 

Current genome scans typically employ a 1 0-cM map of microsatel- 
lite markers for the initial screen 19,20 , followed by denser coverage of 
regions that yield interesting results. (Although one could employ a 
'staged search' strategy of starting with a sparser 20-40-cM map 
and then increasing the density in all moderately positive 
regions 21 ' 22 -, economics of scale in large genotyping labs usually 
argue for a one-stage initial scan: using a single optimized set of 
markers for all projects is more efficient than Tilling in' different 
regions for each.) Microsatellite markers typically vary between 0.65 
and 0.8 in heterozygosity (for instance, an average of 0.7 in the final 
Genelhon map 7 ), and for simplicity I will use microsateilites with 
four equally frequent alleles (heterozygosity of 0.75) as representa- 
tive in the following comparisons with biallelics with two equally 
frequent alleles (heterozygosity of 0.5); results for other values are 
given in Tables I and 2. Intuitively, one would expect two closely 
linked biallelics to provide the same information as one microsatel- 
lite, and simulations largely confirm this intuition. A 10-cM map 
of microsateilites achieves information content ol 0.54 (big. 3). The 
same information content is provided by a 4.5-cM map of biallelic 
markers. A denser 5-cM microsatellite map achieves an informa- 
tion content of 0.75, as does a 2-cM map of biallelics. In general, 
maps of biallelic markers at about 2.25-2.5 times the density of 
microsateilites provide a comparable information content. A 10- 
cM map of 300 microsatellite markers can therefore be replaced by 
a 4-cM map of 750 biallelic markers. These conclusions are in rough 
agreement with the results of an earlier study of the t rade-off 
between marker spacing and polymorphism 23 . 

As technology improves, it is likely that screening a much denser 
map of biallelic markers will be cheaper and easier than carrying 
, out todays genome scans employing microsateilites 13,16 . There are 
reasons to employ such denser maps. As shown above, current scan 
densities lead to considerable loss of information. This problem is 
more serious for data-sets consisting of more distantly related aficct- 
eds or of progeny of consanguineous marriages used in homozy- 
gosity mapping 24 . It is therefore worth noting that a 1-cM map of 
biallelics (about 3,000 markers) yields much higher information 
content than a 10-cM map of microsateilites (0.88 vs. 0.54), and is 
superior to a.5-cM microsatellite map (0.88 vs. 0.75). 



Practical linkage analysis using biallelic markers 

Because of the lower polymorphism rates of biallelic markers, it is 
critical to consider many linked markers simultaneously; indeed, 
all the above results assume complete multipoint analysis of all 
markers on a chromosome. Such multipoint analysis is even more 
important for biallelics than for microsateilites. Fortunately, recent- 
ly developed algorithms and software allow multipoint analysis with 
an essentially unlimited number of linked markers to be carried out 
for sib pairs 17 as well as for general pedigrees of moderate size 18 . 
These methods can also be used for automatic haplotype recon- 
struction, avoiding the tedious prospect of haplotyping many bial- 
lelics by band. The one remaining challenge is extending multipoint 
analysis with many markers to large multi-generational families, 
although even here the situation is improving 25 . 

Discussion 

The results presented here clearly demonstrate that the use of a 
genetic map of biallelic markers for linkage studies is feasible on 
theoretical grounds. It is not necessary to find only perfect' 50-50. 
biallelics: markers with allele frequency distributions as skewed as 
70-30 or even 80-20 are almost as useful in a dense map. This result 
should allay the concern that markers discovered in one population 
may not be sufficiently informative in other populations with dif- 
ferent allele frequencies. A 1-2-crvi map of moderately polymor- 
phic biallelic markers is superior to today's microsatellite screening 
sets for extracting inheritance information and should provide a 
more efficient tool for initial genome scans. 

Even denser maps should enable novel study designs for dissect- 
ing genetically complex phenotypes. In particular, genome scans 
for linkage disequilibrium (LD) and association may become prac- 
tical 26 " 28 . Ik-cause LO mapping relies on detecting recombination- 
ally conserved regions around an ancestral mutation, the required 
map density will vary with the age and history of the study popu- 
lation, with very dense maps (spacing of 10 kb or less) likely to be 
needed for LD scans in a mixed general population. A more promis- 
ing approach may be to screen in parallel functional (coding) bial- 
lelic polymorphisms in many genes for direct association (rather 
than LI)') with disease 26 " 28 . 

Maps of biallelic markers and the technology to genotype them 
should be forthcoming 15,16 , and the resulting progress: in human 
genetics will be exciting to watch. 

Methods 

Simulations. Segregation of chromosomes of 100-cM length with evenly 
spaced markers was simulated, l or biallelics, the frequencies of the common 
allele were 0.5, 0.6, 0.7, 0.8, 0.9, 0.95 and 0.99. For microsateilites, equally fre- 
quent alleles were assumed, with allele numbers of 3, 4, 5, 10, 20 and 100. 
Marker spacings of 1 , 2, . . . , 10 clVl were examined. Each simulation consist- 
ed of 100 replicates of 10 cousin pairs each. Information content was com- 
puted with GENEHUNTER 18 . Information content was measured halfway 
between the two markers closest to the middle of the chromosome. For 
ELOD computation, a dominant disease locus with full penetrance, no phe- 
nocopics and allele frequency of 0.001 was assumed to lie halfway between 
i wo markers, and chromosomes were simulated assuming that both cousins 
*vere affected. GENEHUNTER was used to compute multipoint lod scores, 
ifhc relationship between information content and ELOD is preserved for 
I -ther assumptions about the disease locus (data not shown). Simulation soft- 
ware used to generate the data is available from the author and can be iised 
to explore additional map properties and pedigree structures. 
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Accessing Genetic Information with 
High-Density DNA Arrays 

Mark Chee, Robert Yang, Earl Hubbell, Anthony Berno, 
Xiaohua C. Huang, David Stern, Jim Winkler, David J. Lockhart, 
Macdonald S. Morris, Stephen P. A, Fodor 

Rapid access to genetic information is central to the revolution taking place in molecular 
genetics. The simultaneous analysis of the entire human mitochondrial genome is de- 
scribed here. DNA arrays containing up to 135,000 probes complementary to the 16.6- 
kilobase human mitochondrial genome were generated by light-directed chemical syn- 
thesis. A two-color labeling scheme was developed that allows simultaneous compar- 
ison of a polymorphic target to a reference DNA or RNA. Complete hybridization patterns 
were revealed in a matter of minutes. Sequence polymorphisms were delected with 
single-base resolution and unprecedented efficiency. The methods described are ge- 
neric and can be used to address a variety of questions in molecular genetics including 
gene expression, genetic linkage, and genetic variability. 
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A central theme in modern genetics is die 
relation between generic variability and phe- 
notype. To understand genetic variation and 
its consequences on biological function, an 
enormous effort in comparative sequence 
analysis will need to be carried out. Conven- 
tional nucleic acid sequencing technologies 
make use of analytical separation techniques 
to resolve sequence at the single nucleotide 
level (i, 2). However, the effort required 
increases linearly with the amount of se- 
quence. In contrast, biological systems read, 
store, and modify genetic information by mo- 
lecular recognition (3). Because each DNA 
strand carries with it the capacity to recognize 
a uniquely complementary sequence through 
base pairing, the process of recognition, or 
hybridization, is highly parallel, as every nu- 
cleotide in a large sequence can in principle 
be queried at the same time. Thus, hybrid- 
ization can be used to efficiently analyze 
large amounts of nucleotide sequence. In one 
proposal, sequences are analysed by hybrid- 
ization to a set of oligonucleotides represent- 
ing all possible subsequences (4). A second 
approach, used here, is hyhridi2atton to an 
array of oligonucleotide probes designed to 
match specific sequences. In this way the 
most informative subset of probes is used. 
Implementation of these concepts relies on 
recently developed combinatorial technolo- 
gies to generate any ordered array of a large 
number of oligonucleotide probes (5). 
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The fundamentals of light-directed oli- 
gonucleotide array synthesis have been de- 
scribed (5, 6). Any probe can be synthe- 
sized at any discrete, specified location in 
the array, and any set of probes composed of 
the four nucleotides can be synthesized in a 
maximum of cycles, where N is the 
length of the longest probe in the array. For 
example, the entire set of ~10 12 20-nucle- 
otide oligomer probes, or any desired subset, 
can be synthesized in only 80 coupling cy- 
cles. The number of different probes that 
can be synthesized is limited only by the 
physical size of the array and the achievable 
lithographic resolution (7). 

An array consisting of oligonucleotides 
complementary to subsequences of a target 
sequence can be used to determine die iden- 
tity of a target sequence, measure its amount, 
and detect differences between the target 
and a reference sequence. Many different 
arrays can be designed for these purposes. 
One such design, termed a 4L tiled array, is 
depicted in Fig. 1A. In each set of four 
piohes, the perfect complement will hybrid- 
ize more strongly than mismatched probes. 
By this approach, a nucleic acid target of 
length L can be scanned for mutations with 
a tiled array containing 4L probes. For ex- 
ample, to query the 16,569 base pairs (bp) of 
human mitochondrial DNA (rrttDNA), only 
66,276 probes of the possible MO* 15-nu- 
cleotide oligomers need to be used. 

The use of a tiled array of probes to read a 
target sequence is illustrated in Fig. 1C. A 
tiled an-ay of 1 5 -nucleotide oligomers varied 
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Fig. 3. Human mito- 
cnondrfaJ genome on a 
chip. (A) An image of the 
array hybridized to 16.6 
kb of mitochonoVial target 
RNA (L strand). The 
16.569-bp map of the 
genome is shown, and 
the H strand origin of rep- 
lication (Cy, located in 
the control region, is indi- 
cated. (B) A portion of 
the hybridization pattern 
magnified. In each col- 
umn there are five 
probes: A, C> G , T, and A, 
from top to bottom. The 
A probe has a singfe- 
base deletion instead of a 
substitution and hence is 
24 instead of 25 bases in 
length. The scale is indi- 
cated by the bar beneath 
the image. Although 
there is considerable se- 
quence- depen dent in- 
tensity variation, most of 
the array can be read cfj- 
rectly. The Image was 
collected at a resolution 
of -100 pixels per probe 
cell. (C) The abflrty of the 
array to detect and read 

single- base differences in a 16.6-Wo sample is illustrated. Two different target sequences were hybridized 
in parafieJ to different chips. The hybridization patterns are compared for four different positions in the 
sequence. Only the P 25 - 13 probes are shown, The top panel of each pair shows the hybridization of the mt3 
target, which matches the chip P° sequence at these positions. The lower panel shows the pattern 
generated by a sample from a patient with Leber's hereditary optic neuropathy (LHGN). Three known 
pathogenic mutations, LHON346Q, LHQN4216. and LHON13708, are dearty detected. For comparison, 
the fourth panel in the set shows a regjon around position 1 1 ,778 that is identical in both samples. 




*l 1 mm 



provide the foundation for a powerful ge- 
ne etc analysis technology. The method 
can he used to characterize the spectrum 
of sequence variation in. a population and 
can be applied to the analysis of many 
genes ia parallel. In the case of human 
mtDNA, we simultaneously ana(y2ed the 
control region, 1 3 protein coding genes, 
22 tRNA genes, and 2 ribosomal RNA 
genes. The methods described here can be 
applied to other research areas in molec- 
ular genetics; for example, the ability to 
identify and sequence polymorphisms pro- 
vides a basis for genetic mapping. The 
specificity of oligonucleotide hybridiza- 
tion and the scalability of the method 
suggests the possibility of a dedicated array 
chac could be used to generate a high- 
resolution genetic map of an entire ge- 
nome in a single experiment. Likewise, 
the concepts and techniques described 
here have been used to develop approach- 
es for mRNA identification and the large- 
scale, parallel measurement of expression 
levels (24). Thus, the sequence of a gene, 
its spectrum of change in the population, 
its chromosomal location, and its dynam- 



ics of expression (all essential to a full 
understanding of function) can be deter- 
mined with high-density probe arrays. The 
challenge now is to synthesize and read 
probe arrays at even higher density. For 
example, a 2 cm by 2 cm. array, synthesized 
with probes occupying 1-u.m synthesis 
sites in a 4L tiling, could query the entire 
coding content of the human genome, 
estimated at 100,000 genes, 
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